Bike Rentals are a growing industry but one that faces both weather-based and seasonal variations in demand. It is important to quantify how those those variations affect demand as demand, in turn, affects the number of employees needed as well as the number of bikes that need to be available. Significant mis-matches in bike availability or numbers of workers can result in missing rental opportunities (no bike available or insufficient employees for timely rental) or conversely, extra expenses to the rental company in terms of wages or bike purchases.
Using a publicly available dataset (described below), I attempt to answer two basic questions:
Q1: What are the factors that determine demand?
Q2: Do these factors vary by season?
The dataset consists of 17379 rental records collected over a two year period (2011, 2012). The records have been aggregated into 731 days for this analysis because each day is the determinant of the number bikes and employees needed.
Data attributes are both categorical and numerical and the statistical analyses vary accordingly.
The data are taken from the UCI Machine Learning Repository. The original data are reported and analyized in a paper by Fanaee, T and Gama, J.
## [1] 731 16
As noted previously, the data consist of 731 day records with 15 variables per record (the 16th is a record number). Here is a list of the data attributes, including the variable name the type of data and a sample of the first 10 records for that attribute:
day <- read.csv('day.csv')
str(day)
## 'data.frame': 731 obs. of 16 variables:
## $ instance : int 1 2 3 4 5 6 7 8 9 10 ...
## $ date : Factor w/ 731 levels "1/1/11","1/1/12",..: 1 23 45 51 53 55 57 59 61 3 ...
## $ season : int 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 0 0 0 0 0 0 0 0 0 0 ...
## $ month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : int 0 0 0 0 0 0 0 0 0 0 ...
## $ weekday : int 6 0 1 2 3 4 5 6 0 1 ...
## $ workingday: int 0 0 1 1 1 1 1 0 0 1 ...
## $ conditions: int 2 2 1 1 1 1 2 2 1 1 ...
## $ temp : num 0.344 0.363 0.196 0.2 0.227 ...
## $ felt_temp : num 0.364 0.354 0.189 0.212 0.229 ...
## $ hum : num 0.806 0.696 0.437 0.59 0.437 ...
## $ windspeed : num 0.16 0.249 0.248 0.16 0.187 ...
## $ casual : int 331 131 120 108 82 88 148 68 54 41 ...
## $ registered: int 654 670 1229 1454 1518 1518 1362 891 768 1280 ...
## $ count : int 985 801 1349 1562 1600 1606 1510 959 822 1321 ...
day <- read.csv('day.csv')
summary(day)
## instance date season year
## Min. : 1.0 1/1/11 : 1 Min. :1.000 Min. :0.0000
## 1st Qu.:183.5 1/1/12 : 1 1st Qu.:2.000 1st Qu.:0.0000
## Median :366.0 1/10/11: 1 Median :3.000 Median :1.0000
## Mean :366.0 1/10/12: 1 Mean :2.497 Mean :0.5007
## 3rd Qu.:548.5 1/11/11: 1 3rd Qu.:3.000 3rd Qu.:1.0000
## Max. :731.0 1/11/12: 1 Max. :4.000 Max. :1.0000
## (Other):725
## month holiday weekday workingday
## Min. : 1.00 Min. :0.00000 Min. :0.000 Min. :0.000
## 1st Qu.: 4.00 1st Qu.:0.00000 1st Qu.:1.000 1st Qu.:0.000
## Median : 7.00 Median :0.00000 Median :3.000 Median :1.000
## Mean : 6.52 Mean :0.02873 Mean :2.997 Mean :0.684
## 3rd Qu.:10.00 3rd Qu.:0.00000 3rd Qu.:5.000 3rd Qu.:1.000
## Max. :12.00 Max. :1.00000 Max. :6.000 Max. :1.000
##
## conditions temp felt_temp hum
## Min. :1.000 Min. :0.05913 Min. :0.07907 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:0.33708 1st Qu.:0.33784 1st Qu.:0.5200
## Median :1.000 Median :0.49833 Median :0.48673 Median :0.6267
## Mean :1.395 Mean :0.49538 Mean :0.47435 Mean :0.6279
## 3rd Qu.:2.000 3rd Qu.:0.65542 3rd Qu.:0.60860 3rd Qu.:0.7302
## Max. :3.000 Max. :0.86167 Max. :0.84090 Max. :0.9725
##
## windspeed casual registered count
## Min. :0.02239 Min. : 2.0 Min. : 20 Min. : 22
## 1st Qu.:0.13495 1st Qu.: 315.5 1st Qu.:2497 1st Qu.:3152
## Median :0.18097 Median : 713.0 Median :3662 Median :4548
## Mean :0.19049 Mean : 848.2 Mean :3656 Mean :4504
## 3rd Qu.:0.23321 3rd Qu.:1096.0 3rd Qu.:4776 3rd Qu.:5956
## Max. :0.50746 Max. :3410.0 Max. :6946 Max. :8714
##
hist(day$conditions)
Notice that there are no “4” values and very few “3” values (n= 21). We generally have sunny or partially cloudy days.
hist(day$temp)
hist(day$felt_temp)
hist(day$hum)
hist(day$windspeed)
hist(day$casual)
hist(day$registered)
hist(day$count)
plot(day$conditions, day$casual)
plot(day$conditions, day$registered)
plot(day$hum, day$casual)
plot(day$hum, day$registered)
plot(day$windspeed, day$casual)
plot(day$windspeed, day$registered)
plot(day$temp, day$casual)
plot(day$temp, day$registered)
plot(day$felt_temp, day$casual)
plot(day$felt_temp, day$registered)
plot(day$temp, day$felt_temp)
cor(day$temp, day$felt_temp)
## [1] 0.9917016
plot(day$temp, day$hum)
cor.test(day$temp, day$hum)
##
## Pearson's product-moment correlation
##
## data: day$temp and day$hum
## t = 3.456, df = 729, p-value = 0.0005801
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05495529 0.19765680
## sample estimates:
## cor
## 0.1269629
plot(day$temp, day$windspeed)
cor.test(day$temp, day$windspeed)
##
## Pearson's product-moment correlation
##
## data: day$temp and day$windspeed
## t = -4.3187, df = 729, p-value = 1.787e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2278482 -0.0864203
## sample estimates:
## cor
## -0.1579441
plot(day$hum, day$windspeed)
cor.test(day$hum, day$windspeed)
##
## Pearson's product-moment correlation
##
## data: day$hum and day$windspeed
## t = -6.9265, df = 729, p-value = 9.488e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3153210 -0.1792046
## sample estimates:
## cor
## -0.2484891
plot(day$conditions, day$windspeed)
cor.test(day$conditions, day$windspeed)
##
## Pearson's product-moment correlation
##
## data: day$conditions and day$windspeed
## t = 1.0676, df = 729, p-value = 0.286
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03309737 0.11170461
## sample estimates:
## cor
## 0.03951106
cor.test(day$conditions, day$windspeed, method='spearman')
## Warning in cor.test.default(day$conditions, day$windspeed, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: day$conditions and day$windspeed
## S = 64014000, p-value = 0.6517
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.01672523
plot(day$conditions, day$hum)
cor.test(day$conditions, day$hum, method ='spearman')
## Warning in cor.test.default(day$conditions, day$hum, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: day$conditions and day$hum
## S = 26267000, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.596532
plot(day$conditions, day$temp)
cor.test(day$conditions, day$temp)
##
## Pearson's product-moment correlation
##
## data: day$conditions and day$temp
## t = -3.2802, df = 729, p-value = 0.001087
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1914416 -0.0485129
## sample estimates:
## cor
## -0.1206022
NUMBER OF USERS vs. CONDITIONS
cor.test(day$conditions, day$casual)
##
## Pearson's product-moment correlation
##
## data: day$conditions and day$casual
## t = -6.8927, df = 729, p-value = 1.186e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3142304 -0.1780327
## sample estimates:
## cor
## -0.247353
cor.test(day$conditions, day$registered)
##
## Pearson's product-moment correlation
##
## data: day$conditions and day$registered
## t = -7.2817, df = 729, p-value = 8.566e-13
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3267321 -0.1914898
## sample estimates:
## cor
## -0.2603877
casual_users = lm(day$casual ~ day$conditions)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$conditions)
##
## Coefficients:
## (Intercept) day$conditions
## 1283.1 -311.7
reg_users = lm(day$registered ~ day$conditions)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$conditions)
##
## Coefficients:
## (Intercept) day$conditions
## 4696.5 -745.6
NUMBER OF USERS vs. HUMIDITY
cor.test(day$hum, day$casual)
##
## Pearson's product-moment correlation
##
## data: day$hum and day$casual
## t = -2.0854, df = 729, p-value = 0.03738
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.148691172 -0.004519522
## sample estimates:
## cor
## -0.07700788
cor.test(day$hum, day$registered)
##
## Pearson's product-moment correlation
##
## data: day$hum and day$registered
## t = -2.4697, df = 729, p-value = 0.01375
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.16252867 -0.01869851
## sample estimates:
## cor
## -0.0910886
casual_users = lm(day$casual ~ day$hum)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$hum)
##
## Coefficients:
## (Intercept) day$hum
## 1081.3 -371.2
reg_users = lm(day$registered ~ day$hum)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$hum)
##
## Coefficients:
## (Intercept) day$hum
## 4282.7 -997.8
NUMBER OF USERS vs. WINDSPEED
cor.test(day$windspeed, day$casual)
##
## Pearson's product-moment correlation
##
## data: day$windspeed and day$casual
## t = -4.5905, df = 729, p-value = 5.207e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.23724343 -0.09626984
## sample estimates:
## cor
## -0.1676133
cor.test(day$windspeed, day$registered)
##
## Pearson's product-moment correlation
##
## data: day$windspeed and day$registered
## t = -6.0151, df = 729, p-value = 2.844e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2854614 -0.1472573
## sample estimates:
## cor
## -0.217449
casual_users = lm(day$casual ~ day$windspeed)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$windspeed)
##
## Coefficients:
## (Intercept) day$windspeed
## 1131 -1485
reg_users = lm(day$registered ~ day$windspeed)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$windspeed)
##
## Coefficients:
## (Intercept) day$windspeed
## 4490 -4378
NUMBER OF USERS vs. ACTUAL TEMPERATURE
cor.test(day$temp, day$casual)
##
## Pearson's product-moment correlation
##
## data: day$temp and day$casual
## t = 17.472, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4900779 0.5924581
## sample estimates:
## cor
## 0.5432847
cor.test(day$temp, day$registered)
##
## Pearson's product-moment correlation
##
## data: day$temp and day$registered
## t = 17.323, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4865508 0.5894440
## sample estimates:
## cor
## 0.540012
casual_users = lm(day$casual ~ day$temp)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$temp)
##
## Coefficients:
## (Intercept) day$temp
## -161.3 2037.9
reg_users = lm(day$registered ~ day$temp)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$temp)
##
## Coefficients:
## (Intercept) day$temp
## 1376 4603
NUMBER OF USERS vs. FELT TEMPERATURE
cor.test(day$felt_temp, day$casual)
##
## Pearson's product-moment correlation
##
## data: day$felt_temp and day$casual
## t = 17.499, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4907022 0.5929912
## sample estimates:
## cor
## 0.5438637
cor.test(day$felt_temp, day$registered)
##
## Pearson's product-moment correlation
##
## data: day$felt_temp and day$registered
## t = 17.514, df = 729, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4910559 0.5932932
## sample estimates:
## cor
## 0.5441918
casual_users = lm(day$casual ~ day$felt_temp)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$felt_temp)
##
## Coefficients:
## (Intercept) day$felt_temp
## -238.8 2291.5
reg_users = lm(day$registered ~ day$felt_temp)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$felt_temp)
##
## Coefficients:
## (Intercept) day$felt_temp
## 1185 5210
Since all weather variables show significant effects on the number of riders, lets model the full set. However, because the actual temperature and the felt_temperature are virtually the same, we will use only actual temperatures.
Conditions only vs. Weather data -casual users
casual_users = lm(day$casual ~ day$conditions)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$conditions)
##
## Coefficients:
## (Intercept) day$conditions
## 1283.1 -311.7
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$conditions)
##
## Residuals:
## Min 1Q Median 3Q Max
## -956.4 -460.2 -152.4 231.9 2495.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1283.09 67.73 18.944 < 2e-16 ***
## day$conditions -311.69 45.22 -6.893 1.19e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 665.7 on 729 degrees of freedom
## Multiple R-squared: 0.06118, Adjusted R-squared: 0.0599
## F-statistic: 47.51 on 1 and 729 DF, p-value: 1.186e-11
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$conditions 1 21056843 21056843 47.51 1.186e-11 ***
## Residuals 729 323101979 443213
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)
casual_users = lm(day$casual ~ day$temp + day$hum +day$windspeed)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$temp day$hum day$windspeed
## 582.7 2048.0 -855.7 -1111.8
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1113.5 -327.2 -156.2 145.2 2296.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 582.7 133.4 4.369 1.43e-05 ***
## day$temp 2048.0 115.7 17.703 < 2e-16 ***
## day$hum -855.7 151.6 -5.646 2.36e-08 ***
## day$windspeed -1111.8 279.8 -3.973 7.80e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 562.6 on 727 degrees of freedom
## Multiple R-squared: 0.3313, Adjusted R-squared: 0.3286
## F-statistic: 120.1 on 3 and 727 DF, p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$temp 1 101581307 101581307 320.909 < 2.2e-16 ***
## day$hum 1 7454739 7454739 23.550 1.490e-06 ***
## day$windspeed 1 4996687 4996687 15.785 7.802e-05 ***
## Residuals 727 230126089 316542
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)
Conditions only vs. Weather data -registered users
reg_users = lm(day$registered ~ day$conditions)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$conditions)
##
## Coefficients:
## (Intercept) day$conditions
## 4696.5 -745.6
summary(reg_users)
##
## Call:
## lm(formula = day$registered ~ day$conditions)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3534.9 -1055.4 -25.9 1078.6 3638.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4696.5 153.4 30.623 < 2e-16 ***
## day$conditions -745.6 102.4 -7.282 8.57e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1507 on 729 degrees of freedom
## Multiple R-squared: 0.0678, Adjusted R-squared: 0.06652
## F-statistic: 53.02 on 1 and 729 DF, p-value: 8.566e-13
anova(reg_users)
## Analysis of Variance Table
##
## Response: day$registered
## Df Sum Sq Mean Sq F value Pr(>F)
## day$conditions 1 120491318 120491318 53.023 8.566e-13 ***
## Residuals 729 1656620654 2272456
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_users)
reg_users = lm(day$registered ~ day$temp + day$hum +day$windspeed)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$temp day$hum day$windspeed
## 3502 4577 -2244 -3695
summary(reg_users)
##
## Call:
## lm(formula = day$registered ~ day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3738.0 -982.8 -160.1 978.2 3165.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3501.7 299.1 11.706 < 2e-16 ***
## day$temp 4577.5 259.5 17.641 < 2e-16 ***
## day$hum -2244.4 340.0 -6.602 7.84e-11 ***
## day$windspeed -3695.1 627.6 -5.887 5.99e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1262 on 727 degrees of freedom
## Multiple R-squared: 0.3486, Adjusted R-squared: 0.3459
## F-statistic: 129.7 on 3 and 727 DF, p-value: < 2.2e-16
anova(reg_users)
## Analysis of Variance Table
##
## Response: day$registered
## Df Sum Sq Mean Sq F value Pr(>F)
## day$temp 1 518228818 518228818 325.446 < 2.2e-16 ***
## day$hum 1 46037414 46037414 28.911 1.022e-07 ***
## day$windspeed 1 55195487 55195487 34.663 5.989e-09 ***
## Residuals 727 1157650253 1592366
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_users)
conditions = lm(day$conditions ~ day$temp + day$hum + day$windspeed)
conditions
##
## Call:
## lm(formula = day$conditions ~ day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$temp day$hum day$windspeed
## -0.1567 -0.5250 2.5131 1.2296
summary(conditions)
##
## Call:
## lm(formula = day$conditions ~ day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.91615 -0.29110 -0.07017 0.26731 3.03902
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.15674 0.09887 -1.585 0.113
## day$temp -0.52504 0.08577 -6.121 1.52e-09 ***
## day$hum 2.51310 0.11237 22.364 < 2e-16 ***
## day$windspeed 1.22962 0.20746 5.927 4.76e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4171 on 727 degrees of freedom
## Multiple R-squared: 0.4164, Adjusted R-squared: 0.414
## F-statistic: 172.9 on 3 and 727 DF, p-value: < 2.2e-16
anova(conditions)
## Analysis of Variance Table
##
## Response: day$conditions
## Df Sum Sq Mean Sq F value Pr(>F)
## day$temp 1 3.153 3.153 18.12 2.346e-05 ***
## day$hum 1 80.996 80.996 465.54 < 2.2e-16 ***
## day$windspeed 1 6.112 6.112 35.13 4.761e-09 ***
## Residuals 727 126.484 0.174
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(conditions)
casual_users = lm(day$casual ~ day$season)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$season)
##
## Coefficients:
## (Intercept) day$season
## 523.5 130.1
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1041.7 -490.5 -151.5 260.9 2626.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 523.49 61.15 8.561 < 2e-16 ***
## day$season 130.05 22.38 5.811 9.29e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 671.7 on 729 degrees of freedom
## Multiple R-squared: 0.04427, Adjusted R-squared: 0.04296
## F-statistic: 33.77 on 1 and 729 DF, p-value: 9.288e-09
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$season 1 15235157 15235157 33.766 9.288e-09 ***
## Residuals 729 328923665 451198
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)
casual_users = lm(day$casual ~ day$season + day$temp +day$hum +day$windspeed)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$season + day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$season day$temp day$hum day$windspeed
## 546.53 26.17 2001.35 -882.45 -1055.47
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$season + day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1108.2 -335.6 -151.6 148.3 2337.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 546.53 136.27 4.011 6.68e-05 ***
## day$season 26.17 20.44 1.281 0.200739
## day$temp 2001.35 121.25 16.505 < 2e-16 ***
## day$hum -882.45 152.94 -5.770 1.17e-08 ***
## day$windspeed -1055.47 283.14 -3.728 0.000208 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 562.4 on 726 degrees of freedom
## Multiple R-squared: 0.3328, Adjusted R-squared: 0.3292
## F-statistic: 90.55 on 4 and 726 DF, p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$season 1 15235157 15235157 48.172 8.664e-12 ***
## day$temp 1 86666882 86666882 274.034 < 2.2e-16 ***
## day$hum 1 8254671 8254671 26.101 4.147e-07 ***
## day$windspeed 1 4394686 4394686 13.896 0.0002083 ***
## Residuals 726 229607427 316264
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)
casual_users = lm(day$casual ~ day$month + day$temp +day$hum +day$windspeed)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$month + day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$month day$temp day$hum day$windspeed
## 570.046 3.594 2036.031 -870.145 -1089.629
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$month + day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1110.4 -327.2 -152.5 147.0 2305.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 570.046 135.307 4.213 2.84e-05 ***
## day$month 3.594 6.378 0.563 0.573291
## day$temp 2036.031 117.695 17.299 < 2e-16 ***
## day$hum -870.145 153.784 -5.658 2.20e-08 ***
## day$windspeed -1089.629 282.710 -3.854 0.000126 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 562.9 on 726 degrees of freedom
## Multiple R-squared: 0.3316, Adjusted R-squared: 0.3279
## F-statistic: 90.06 on 4 and 726 DF, p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$month 1 5207277 5207277 16.435 5.578e-05 ***
## day$temp 1 96378141 96378141 304.186 < 2.2e-16 ***
## day$hum 1 7841236 7841236 24.748 8.163e-07 ***
## day$windspeed 1 4706674 4706674 14.855 0.0001264 ***
## Residuals 726 230025494 316840
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)
plot(day$weekday, day$casual)
plot(day$weekday, day$registered)
casual_users = lm(day$casual ~ day$workingday + day$weekday + day$holiday + day$temp +day$hum +day$windspeed)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$workingday + day$weekday + day$holiday +
## day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$workingday day$weekday day$holiday
## 1017.0 -835.4 22.8 -277.9
## day$temp day$hum day$windspeed
## 2144.5 -798.5 -1148.6
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$workingday + day$weekday + day$holiday +
## day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1437.96 -222.00 -10.43 163.64 1678.57
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1017.041 104.125 9.768 < 2e-16 ***
## day$workingday -835.431 34.125 -24.481 < 2e-16 ***
## day$weekday 22.803 7.703 2.960 0.00318 **
## day$holiday -277.935 95.316 -2.916 0.00366 **
## day$temp 2144.543 85.334 25.131 < 2e-16 ***
## day$hum -798.516 111.828 -7.141 2.27e-12 ***
## day$windspeed -1148.571 206.140 -5.572 3.56e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 414.4 on 724 degrees of freedom
## Multiple R-squared: 0.6387, Adjusted R-squared: 0.6357
## F-statistic: 213.3 on 6 and 724 DF, p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$workingday 1 92361829 92361829 537.716 < 2.2e-16 ***
## day$weekday 1 2121526 2121526 12.351 0.0004681 ***
## day$holiday 1 1792826 1792826 10.438 0.0012906 **
## day$temp 1 111988413 111988413 651.979 < 2.2e-16 ***
## day$hum 1 6202415 6202415 36.109 2.954e-09 ***
## day$windspeed 1 5332509 5332509 31.045 3.557e-08 ***
## Residuals 724 124359304 171767
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(casual_users)
reg_users = lm(day$registered ~ day$workingday + day$weekday + day$holiday + day$temp +day$hum +day$windspeed)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$workingday + day$weekday +
## day$holiday + day$temp + day$hum + day$windspeed)
##
## Coefficients:
## (Intercept) day$workingday day$weekday day$holiday
## 2873.22 907.81 28.87 -221.32
## day$temp day$hum day$windspeed
## 4455.64 -2274.75 -3659.70
summary(reg_users)
##
## Call:
## lm(formula = day$registered ~ day$workingday + day$weekday +
## day$holiday + day$temp + day$hum + day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4094.8 -943.0 -32.0 865.3 2874.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2873.22 297.77 9.649 < 2e-16 ***
## day$workingday 907.81 97.59 9.302 < 2e-16 ***
## day$weekday 28.87 22.03 1.311 0.190
## day$holiday -221.32 272.58 -0.812 0.417
## day$temp 4455.64 244.04 18.258 < 2e-16 ***
## day$hum -2274.75 319.80 -7.113 2.73e-12 ***
## day$windspeed -3659.70 589.51 -6.208 9.03e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1185 on 724 degrees of freedom
## Multiple R-squared: 0.4277, Adjusted R-squared: 0.423
## F-statistic: 90.18 on 6 and 724 DF, p-value: < 2.2e-16
anova(reg_users)
## Analysis of Variance Table
##
## Response: day$registered
## Df Sum Sq Mean Sq F value Pr(>F)
## day$workingday 1 164133237 164133237 116.8411 < 2.2e-16 ***
## day$weekday 1 3845951 3845951 2.7378 0.09843 .
## day$holiday 1 1451854 1451854 1.0335 0.30967
## day$temp 1 488776353 488776353 347.9439 < 2.2e-16 ***
## day$hum 1 47722372 47722372 33.9720 8.417e-09 ***
## day$windspeed 1 54138590 54138590 38.5395 9.034e-10 ***
## Residuals 724 1017043617 1404756
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(reg_users)
casual_users = lm(day$casual ~ day$workingday + day$temp +day$hum +day$windspeed)
casual_users
##
## Call:
## lm(formula = day$casual ~ day$workingday + day$temp + day$hum +
## day$windspeed)
##
## Coefficients:
## (Intercept) day$workingday day$temp day$hum
## 1063.6 -806.6 2149.5 -812.7
## day$windspeed
## -1145.3
summary(casual_users)
##
## Call:
## lm(formula = day$casual ~ day$workingday + day$temp + day$hum +
## day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1345.19 -217.83 -10.19 170.03 1769.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1063.55 101.37 10.492 < 2e-16 ***
## day$workingday -806.63 33.41 -24.143 < 2e-16 ***
## day$temp 2149.52 86.32 24.901 < 2e-16 ***
## day$hum -812.74 112.98 -7.194 1.57e-12 ***
## day$windspeed -1145.31 208.55 -5.492 5.51e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 419.3 on 726 degrees of freedom
## Multiple R-squared: 0.6291, Adjusted R-squared: 0.6271
## F-statistic: 307.9 on 4 and 726 DF, p-value: < 2.2e-16
anova(casual_users)
## Analysis of Variance Table
##
## Response: day$casual
## Df Sum Sq Mean Sq F value Pr(>F)
## day$workingday 1 92361829 92361829 525.333 < 2.2e-16 ***
## day$temp 1 112350448 112350448 639.023 < 2.2e-16 ***
## day$hum 1 6501869 6501869 36.981 1.928e-09 ***
## day$windspeed 1 5302333 5302333 30.158 5.510e-08 ***
## Residuals 726 127642343 175816
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(casual_users)
reg_users = lm(day$registered ~ day$workingday + day$temp +day$hum +day$windspeed)
reg_users
##
## Call:
## lm(formula = day$registered ~ day$workingday + day$temp + day$hum +
## day$windspeed)
##
## Coefficients:
## (Intercept) day$workingday day$temp day$hum
## 2945.8 932.4 4460.2 -2294.1
## day$windspeed
## -3656.4
summary(reg_users)
##
## Call:
## lm(formula = day$registered ~ day$workingday + day$temp + day$hum +
## day$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4079.2 -912.9 -41.4 868.6 2873.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2945.82 286.66 10.276 < 2e-16 ***
## day$workingday 932.44 94.48 9.869 < 2e-16 ***
## day$temp 4460.19 244.11 18.271 < 2e-16 ***
## day$hum -2294.09 319.48 -7.181 1.72e-12 ***
## day$windspeed -3656.39 589.75 -6.200 9.48e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1186 on 726 degrees of freedom
## Multiple R-squared: 0.4256, Adjusted R-squared: 0.4225
## F-statistic: 134.5 on 4 and 726 DF, p-value: < 2.2e-16
anova(reg_users)
## Analysis of Variance Table
##
## Response: day$registered
## Df Sum Sq Mean Sq F value Pr(>F)
## day$workingday 1 164133237 164133237 116.743 < 2.2e-16 ***
## day$temp 1 489324633 489324633 348.043 < 2.2e-16 ***
## day$hum 1 48906300 48906300 34.786 5.641e-09 ***
## day$windspeed 1 54041429 54041429 38.438 9.478e-10 ***
## Residuals 726 1020706374 1405932
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(2,2))
plot(reg_users)